A Clustering Algorithm For Chinese Adjectives And Nouns

نویسندگان

  • Yang Wen
  • Chunfa Yuan
  • Changning Huang
چکیده

This paper proposes a bidirctional hierarchical clustering algorithm for simultaneously clustering words of different parts of speech based on collocations. The algorithm is composed of cycles of two kinds of alternate clustering processes. We construct an objective function based on Minimum Description Length. To. partly solve the problem caused by sparse data two concepts of collocational degree and revisional distance are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the value space of attributes: Unsupervised bidirectional clustering of adjectives in German

The paper presents an iterative bidirectional clustering of adjectives and nouns based on a cooccurrence matrix. The clustering method combines a Vector Space Models (VSM) and the results of a Latent Dirichlet Allocation (LDA), whose results are merged in each iterative step. The aim is to derive a clustering of German adjectives that reflects latent semantic classes of adjectives, and that can...

متن کامل

The Other Pole of Degree Modification of Gradable Nouns by Size Adjectives: A Mandarin Chinese Perspective

Size adjectives can have degree readings when they modify gradable nouns. However, a cross-linguistic variation exists with respect to what type(s) of size adjectives in a particular language can have such readings. In English degree readings are available only for size adjectives that predicate bigness, and in Mandarin Chinese degree readings are available for all size adjectives irrespective ...

متن کامل

Semantic Clustering in Dutch Automatically inducing semantic classes from large-scale corpora

Handcrafting semantic classes is a difficult and time-consuming job, and depends on human interpretation. Possible machine learning techniques would be much faster, and do not rely on interpretation, because they stick to the data. The goal of this research is to present some machine learning techniques that make it possible to achieve an automatic clustering of Dutch words. More particularly, ...

متن کامل

Semantic Classification of Chinese Unknown Words

This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area. Prior research in Chinese unknown words mostly focused on proper nouns (Lee 1993, Lee, Lee and C...

متن کامل

Building a Chinese Lexical Taxonomy

In this paper, we present a Chinese lexical taxonomy, a hierarchically organization of Chinese lexical classes of nouns, verbs and adjectives. We first describe the structure of this taxonomy and then present the methods we used to build it. The distinctive characteristics of this lexical taxonomy are: 1) we use definition frame to describe each lexical class, as well as its members, 2) the lex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000